Speech 2 Speech
InterSpeech2024
音声AIアプリ
音声処理
音声工学・音響工学
eleven labs
https://elevenlabs.io/docs/speech-synthesis/speech-to-speech
https://github.com/huggingface/speech-to-speech
hume ai
https://www.hume.ai/
recent speech language model
https://drive.google.com/file/d/1O5PKFl6fhLXyZVCdFmbmfDFiRGQeds_8/view
GLM-4-Voice
https://github.com/THUDM/GLM-4-Voice
J-Moshi を試す
https://note.com/schroneko/n/n6b7a95742ab2
Moshi: a speech-text foundation model for real-time dialogue
https://arxiv.org/abs/2410.00037
Soundwave: Less is More for Speech-Text Alignment in LLMs
https://arxiv.org/abs/2502.12900
Crossing the uncanny valley of conversational voice
https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice
Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
https://arxiv.org/html/2505.15670v1
ターン検出のsmart-turnでリアルタイムで発話中かどうかを判定する
https://ayousanz.hatenadiary.jp/entry/2025/10/12/231156
MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
https://arxiv.org/abs/2510.00499
Thai Semantic End-of-Turn Detection for Real-Time Voice Agents
https://arxiv.org/abs/2510.04016